Details of built-in Estimators¶

In this page, we will introduce the built-in estimator classes in each module. In short, they all has fit, predict, set_params methods. You can find more details in APIs to call them directly, or evaluate them in the experiment class.

About Classification¶

Three classical semi-supervised algorithms are implemented as the baselines of safe SSL, including Transductive Support Vector Machine (TSVM), Label Propagation Algorithm (LPA) and Co-training (CoTraining).

About Data Quality¶

Two algorithms called LEAD (LargE margin grAph quality juDgement) and SLP (Stochastic Label Propagation) are implemented in this package. LEAD is the first algorithm to study the quality of the graph. The basic idea is that given a set of candidate graphs, when one graph has a high quality, its predictive results may have a large margin separation. Therefore, given multiple graphs with unknown quality, one should encourage to use the graphs with a large margin, rather than the graphs with a small margin, and reduce the chances of performance degradation consequently. The proposed stacking method first regenerates a new SSL data set with the predictive results of GSSL on candidate graphs, and then formulates safe GSSL as the classical semi-supervised SVM optimization on the regenerated dataset.

SLP is a lightweight label propagation method for large-scale network data. A lightweight iterative process derived from the well-known stochastic gradient descent strategy is used to reduce memory overhead and accelerate the solving process.

About Model Uncertainty¶

We provide two safe SSL algorithms named S4VM (Safe Semi-supervised Support Vector Machine) and SAFER (SAFE semi-supervised Regression) in this package.

S4VM first generates a pool of diverse large margin low-density separators, and then optimizes the label assignment for the unlabeled data in the worse case under the assumption that the ground-truth label assignment can be realized by one of the obtained low-density separators.

For semi-supervised regression (SSR), SAFER tries to learn a safe prediction given a set of SSR predictions obtained in various ways. To achieve this goal, the safe semi-supervised regression problem is forumlated as a geometric projection issue. When the ground-truth label assignment is realized by a convex linear combination of base regressors, the proposal is probably safe and achieve the maximal worst-case performance gain.

About Ensemble¶

We also implement an ensemble method called SafetyForest to provide a safer prediction when given a set of training models or prediction results. SafetyForest works in a similar way as LEAD. The only difference between the two is that the input of the latter needs to be the predictions of graphs, but doest not need for the former.

About Wrapper¶

The wrapper helps wrapping the algorithms in third-party packages such as scikit-learn. We wrap some popular supervised learning algorithms in the wrapper as examples.